Modeling the Acoustic Correlates of Dialog Act for Expressive Chinese Tts Synthesis
نویسندگان
چکیده
This paper proposed a novel approach for describing the expressivity of dialog text and modelling their acoustic correlates for expressive text-to-speech (TTS) synthesis. We applied the Dialog Acts (DAs) in describing expressivity. In particular, we set up a Wizard-of-Oz (WoZ) data collection framework to collect the tourism domain corpus and annotated the DAs. A Pitch Target model which is optimized to describe Mandarin F0 contours was introduced to model the pitch contour of Mandarin syllables. Then a Generalized Regression Neural Network (GRNN) based model was developed, that can transform acoustic features of neutral speech (parameters of pitch target model, duration, energy and pauses) to resemble expressive speech, according to the DA of the input text. Perceptual evaluation of the modified speech outputs shows that over 63% of the utterances carry appropriate expressivity. Expressive Mean Opinion Score also demonstrated that modified speech improved the expressivity of the neutral speech.
منابع مشابه
Modeling the acoustic correlates of expressive elements in text genres for expressive text-to-speech synthesis
This paper proposes a novel approach for describing the expressive elements in text genres and modeling their acoustic correlates for expressive text-to-speech synthesis (TTS). We apply the three-dimensional PAD (pleasure-displeasure, arousal-nonarousal and dominance-submissiveness) model in describing expressivity. In particular, we define a set of principles for annotating the P and A values ...
متن کاملSpeech acts and dialog TTS
The approach outlined in this paper aims to provide better expressivity of unit selection TTS for dialog intended applications while retaining the natural sounding voice quality typical of unit selection synthesis. A small set of speech acts were used to annotate a corpus from one female US English speaker. The corpus was composed of speech read primarily from interactive dialogs of various kin...
متن کاملEnriching Text-to-Speech Synthesis Using Automatic Dialog Act Tags
We present an approach for enriching dialog based textto-speech (TTS) synthesis systems by explicitly controlling the expressiveness through the use of dialog act tags. The dialog act tags in our framework are automatically obtained by training a maximum entropy classifier on the Switchboard-DAMSL data set, unrelated to the TTS database. We compare the voice quality produced by exploiting autom...
متن کاملDialog speech acts and prosody: Considerations for TTS
As natural language dialog systems involving both speech recognition and text-to-speech (TTS) synthesis become more sophisticated, the limitations of general-purpose TTS for human-computer dialogs have become more apparent. Much subtlety and complexity of meaning in natural language dialogs is conveyed by prosody; how something is said is often as important as what words are spoken. At the same...
متن کاملApplication of expressive TTS synthesis in an advanced ECA system
The research project COMPANIONS aims at developing an advanced embodied conversational agent (ECA). This ECA is used in two scenarios and two languages (English and Czech), and it requires a TTS system being able to generate very natural expressive and emotional speech output. This paper describes application issues of two such systems within the ECA, introduces approaches to expressive speech ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008